Execution of Regular DO Loops on Asynchronous Multiprocessors
نویسنده
چکیده
This paper studies issues concerning parallel execution of regular Fortran DO loops on an asynchronous shared-memory multiprocessor, where each iteration is the basic unit to be executed by a single processing element. An iteration is a dependent predecessor of another iteration if execution of the latter iteration has to wait until execution of the former iteration has completed. During the execution of a DO loop, an iteration will pass through four states, namely, idle, pending, ready, and nished states. An iteration is idle if none of its dependent predecessors have completed; an iteration is pending if some of its dependent predecessors have completed, but not all; an iteration is ready if all its dependent predecessors have completed, but itself has not; otherwise, an iteration is nished. In addition, an iteration without any dependent predecessors is called an initial iteration, which can only have ready and nished states. Via describing an execution scheme, this paper studies the characteristics of Fortran DO loops which are related to the eeciency of the execution. Speciically, this paper investigates (1) the number of initial iterations, (2) the maximum number of ready iterations at any instances during the execution, (3) the maximum number of pending iterations at any instances during the execution, (4) a hash function to disperse diierent pending iterations, and (5) the parallel execution time.
منابع مشابه
Dependence Driven Execution for Data Parallelism
This paper proposes an efficient run-time system to schedule general nested loops on multiprocessors. The work extends existing one-dimensional loop scheduling strategies such as static scheduling, affinity scheduling and various dynamic scheduling methods. The extensions are twofold. First, multiple independent loops as found in different branches of parbegin/parend constructs can be scheduled...
متن کاملComputation-Communication Overlap on Network-of-Workstation Multiprocessors
This paper describes and evaluates a compiler transformation that improves the performance of parallel programs on Network-of-Workstation (NOW) sharedmemory multiprocessors. The transformation overlaps the communication time resulting form non-local memory accesses with the computation time in parallel loops to effectively hide the latency of the remote accesses. The transformation peels from a...
متن کاملSpeculative Parallel Execution of Loops with Cross-Iteration Dependences in DSM Multiprocessors
Speculative parallel execution of non-analyzable codes on Distributed Shared-Memory (DSM) multiprocessors is challenging due to the long-latency and distribution involved. However , such an approach may well be the best way of speeding up codes whose dependences can not be compiler analyzed. In previous work, we suggested executing the loop speculatively in parallel and adding extensions to the...
متن کاملGeneralized Unimodular Loop Transformations for Distributed Memory Multiprocessors
In this paper, we present a generalized unimodular loop transformation as a simple, systematic and elegant method for partitioning the iteration spaces of nested loops for execution on distributed memory multiprocessors. We present a methodology for deriving the transformations that internalize multiple dependences in a multidimen-sional iteration space without resulting in a deadlocking situat...
متن کاملParallel Execution of Loops with Conditional Statements
This paper describes an approach to the evaluation of bounds of the execution time and number of processors needed to execute DO-like loops on MIMD systems. In the scope of this paper, we only consider single-nested loops. The evaluation of these bounds is done from information about dependences known at compile time. Due to the lack of information at compile time about the dynamic execution of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1991